Corpus-based methods and hand-built methods

نویسنده

  • Richard Sproat
چکیده

Recent success of statistical corpus-based methods in a variety of areas of speech and language processing has led to the widespread view that traditional hand-built “rule-based” approaches are moribund. This is a misconception. As I shall argue in this talk, it is unlikely that rule-based approaches will ever be eliminated. Two examples are given to support this conclusion; one where the linguistic facts, though highly complex are basically quite regular; and another where the linguistic fact is exceedingly simple (hence hardly worth the effort of inferring from data), but where adding in this information can improve the output of a statistical model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Provide a Model for Shaping the Subject in Comparative Studies and Research in the Field of Art With Emphasis on Interdisciplinary Studies

Consideration of comparative research as a "separate and different research process" is an issue that has not been addressed thoroughly, at least in Iran, and few of the research conducted under the title of "comparative" refer to studies conducted using different methods than the usual research methods. On the other hand, there has been a rise in the importance of interaction between different...

متن کامل

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

روابط بین «ویژگی‌های شناختی» و «پیکره‌بندی فضایی» محیط مصنوع، تجربه‌ای در دزفول

In an urban built environment, on the one hand people behave based on their spatial cognition of the environment spatial behavior in interaction with the environment depends on the spatial cognition. On the other hand many researches have also pointed that spatial configuration as the relational characteristics of the physical elements of environment, influence on the spatial cognition. Based o...

متن کامل

Unsupervised Named Entity Classification Models and their Ensembles

This paper proposes an unsupervised learning model for classifying named entities. This model uses a training set, built automatically by means of a small-scale named entity dictionary and an unlabeled corpus. This enables us to classify named entities without the cost for building a large hand-tagged training corpus or a lot of rules. Our model uses the ensemble of three different learning met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000